discrete variable
- Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (5 more...)
Leveraging Inter-Layer Dependency for Post -Training Quantization
Prior works on Post-training Quantization (PTQ) typically separate a neural network into sub-nets and quantize them sequentially. This process pays little attention to the dependency across the sub-nets, hence is less optimal. In this paper, we propose a novel Network-Wise Quantization (NWQ) approach to fully leveraging inter-layer dependency. NWQ faces a larger scale combinatorial optimization problem of discrete variables than in previous works, which raises two major challenges: over-fitting and discrete optimization problem. NWQ alleviates over-fitting via a Activation Regularization (AR) technique, which better controls the activation distribution. To optimize discrete variables, NWQ introduces Annealing Softmax (ASoftmax) and Annealing Mixup (AMixup) to progressively transition quantized weights and activations from continuity to discretization, respectively. Extensive experiments demonstrate that NWQ outperforms previous state-of-the-art by a large margin: 20.24\% for the challenging configuration of MobileNetV2 with 2 bits on ImageNet, pushing extremely low-bit PTQ from feasibility to usability. In addition, NWQ is able to achieve competitive results with only 10\% computation cost of previous works.
- North America > United States > Colorado (0.04)
- Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
- Europe > Belgium > Flanders (0.04)
- Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
- North America > United States > Gulf of Mexico > Central GOM (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (2 more...)
1. [ALL] As R3 appreciates, our paper is mainly theoretical in nature and the focus has been to present a correct
Regarding "plots are noisy and don't really support well the claim that the algorithm recovers the true Check the sharp jump in Figure 2 which is expected based on Theorem 3. Similarly, Figure 3 shows that Markov blanket can be recovered with sufficient number of observational data. NP-hard [Chickering, 1996, Learning Bayesian Networks Is NP-Complete]. Rank-2 is only used for clarity. Reviewer 2 has asked to present a case where Assumption 4 is violated. Assume that every variable can take 4 values.
Bouncy particle sampler with infinite exchanging parallel tempering
Saito, Yohei, Kimura, Shun, Takeda, Koujin
Bayesian inference is useful to obtain a predictive distribution with a small generalization error. However, since posterior distributions are rarely evaluated analytically, we employ the variational Bayesian inference or sampling method to approximate posterior distributions. When we obtain samples from a posterior distribution, Hamiltonian Monte Carlo (HMC) has been widely used for the continuous variable part and Markov chain Monte Carlo (MCMC) for the discrete variable part. Another sampling method, the bouncy particle sampler (BPS), has been proposed, which combines uniform linear motion and stochastic reflection to perform sampling. BPS was reported to have the advantage of being easier to set simulation parameters than HMC. To accelerate the convergence to a posterior distribution, we introduced parallel tempering (PT) to BPS, and then proposed an algorithm when the inverse temperature exchange rate is set to infinity. We performed numerical simulations and demonstrated its effectiveness for multimodal distribution.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)